Goto

Collaborating Authors

 stacked generalization


Demand Prediction Using Machine Learning Methods and Stacked Generalization

arXiv.org Artificial Intelligence

Supply and demand are two fundamental concepts of sellers and customers. Predicting demand accurately is critical for organizations in order to be able to make plans. In this paper, we propose a new approach for demand prediction on an e-commerce web site. The proposed model differs from earlier models in several ways. The business model used in the e-commerce web site, for which the model is implemented, includes many sellers that sell the same product at the same time at different prices where the company operates a market place model. The demand prediction for such a model should consider the price of the same product sold by competing sellers along the features of these sellers. In this study we first applied different regression algorithms for specific set of products of one department of a company that is one of the most popular online e-commerce companies in Turkey. Then we used stacked generalization or also known as stacking ensemble learning to predict demand. Finally, all the approaches are evaluated on a real world data set obtained from the e-commerce company. The experimental results show that some of the machine learning methods do produce almost as good results as the stacked generalization method.


Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods

arXiv.org Machine Learning

This study uses stacked generalization, which is a two-step process of combining machine learning methods, called meta or super learners, for improving the performance of algorithms in step one (by minimizing the error rate of each individual algorithm to reduce its bias in the learning set) and then in step two inputting the results into the meta learner with its stacked blended output (demonstrating improved performance with the weakest algorithms learning better). The method is essentially an enhanced cross-validation strategy. Although the process uses great computational resources, the resulting performance metrics on resampled fraud data show that increased system cost can be justified. A fundamental key to fraud data is that it is inherently not systematic and, as of yet, the optimal resampling methodology has not been identified. Building a test harness that accounts for all permutations of algorithm sample set pairs demonstrates that the complex, intrinsic data structures are all thoroughly tested. Using a comparative analysis on fraud data that applies stacked generalizations provides useful insight needed to find the optimal mathematical formula to be used for imbalanced fraud data sets.


Automate Stacking In Python: How to Boost Your Performance While Saving Time

#artificialintelligence

Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it. First introduced in the 1992 paper Stacked Generalization by David Wolpert, their main purpose is to reduce the generalization error. According to Wolpert, they can be understood "as a more sophisticated version of cross-validation". While Wolpert himself noted at the time that large parts of stacked generalizations are "black art", it seems that building larger and larger stacked generalizations win over smaller stacked generalizations.


Stock Prediction with ML: Ensemble Modeling -- The Alpha Scientist

#artificialintelligence

Markets are, in my view, mostly random. Many small inefficiencies and patterns exist in markets which can be identified and used to gain slight edge on the market. These edges are rarely large enough to trade in isolation - transaction costs and overhead can easily exceed the expected profits offered. But when we are able to combine many such small edges together, the rewards can be great. In this article, I'll present a framework for blending together outputs from multiple models using a type of ensemble modeling known as stacked generalization. This approach excels at creating models which "generalize" well to unknown future data, making them an excellent choice for the financial domain, where overfitting to past data is a major challenge.


Stock Prediction with ML: Ensemble Modeling -- The Alpha Scientist

#artificialintelligence

Markets are, in my view, mostly random. Many small inefficiencies and patterns exist in markets which can be identified and used to gain slight edge on the market. These edges are rarely large enough to trade in isolation - transaction costs and overhead can easily exceed the expected profits offered. But when we are able to combine many such small edges together, the rewards can be great. In this article, I'll present a framework for blending together outputs from multiple models using a type of ensemble modeling known as stacked generalization. This approach excels at creating models which "generalize" well to unknown future data, making them an excellent choice for the financial domain, where overfitting to past data is a major challenge.


Stacked Generalization: An Introduction to Super Learning

#artificialintelligence

Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into what is now known as "Super Learner". Super Learner uses V-fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of the Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details.


Issues in Stacked Generalization

Journal of Artificial Intelligence Research

Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones. We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks. We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging.